67 research outputs found
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
Benchmarking Transformers-based models on French Spoken Language Understanding tasks
In the last five years, the rise of the self-attentional Transformer-based
architectures led to state-of-the-art performances over many natural language
tasks. Although these approaches are increasingly popular, they require large
amounts of data and computational resources. There is still a substantial need
for benchmarking methodologies ever upwards on under-resourced languages in
data-scarce application conditions. Most pre-trained language models were
massively studied using the English language and only a few of them were
evaluated on French. In this paper, we propose a unified benchmark, focused on
evaluating models quality and their ecological impact on two well-known French
spoken language understanding tasks. Especially we benchmark thirteen
well-established Transformer-based models on the two available spoken language
understanding tasks for French: MEDIA and ATIS-FR. Within this framework, we
show that compact models can reach comparable results to bigger ones while
their ecological impact is considerably lower. However, this assumption is
nuanced and depends on the considered compression method.Comment: Accepted paper at INTERSPEECH 202
Lifelong learning and task-oriented dialogue system: what does it mean?
International audienceThe main objective of this paper is to propose a functional definition of lifelong learning system adapted to the framework of task-oriented system. We mainly identified two aspects where a lifelong learning technology could be applied in such system: improve the natural language understanding module and enrich the database used by the system. Given our definition, we present an example of how it could be implemented in an actual task-oriented dialogue system that is developed in the LIHLITH project
Semantic enrichment towards efficient speech representations
Over the past few years, self-supervised learned speech representations have
emerged as fruitful replacements for conventional surface representations when
solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual
models trained on massive textual data were introduced to encode language
agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make
profit from such textual models to enrich multilingual speech representations
with language agnostic semantics. By aiming for better semantic extraction on a
challenging Spoken Language Understanding task and in consideration with
computation costs, this study investigates a specific in-domain semantic
enrichment of the SAMU-XLSR model by specializing it on a small amount of
transcribed data from the downstream task. In addition, we show the benefits of
the use of same-domain French and Italian benchmarks for low-resource language
portability and explore cross-domain capacities of the enriched SAMU-XLSR.Comment: INTERSPEECH 202
EÌtude sur les repreÌsentations continues de mots appliqueÌes aÌ la deÌtection automatique des erreurs de reconnaissance de la parole
My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur lâutilisation dâune approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. Lâexploitation des embeddings repose sur lâidĂ©e que la dĂ©tection dâerreurs consiste Ă trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. LâintĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. Dâabord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable dâintĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte dâune part sur lâĂ©valuation de diffĂ©rents types dâembeddings linguistiques puis sur leurs combinaisons. Dâautre part, elle sâintĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur lâanalyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que lâinformation fournie par notre systĂšme de dĂ©tections dâerreurs dans plusieurs cadres applicatifs
A study of continuous word representations applied to the automatic detection of speech recognition errors
Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur lâutilisation dâune approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. Lâexploitation des embeddings repose sur lâidĂ©e que la dĂ©tection dâerreurs consiste Ă trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. LâintĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. Dâabord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable dâintĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte dâune part sur lâĂ©valuation de diffĂ©rents types dâembeddings linguistiques puis sur leurs combinaisons. Dâautre part, elle sâintĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur lâanalyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que lâinformation fournie par notre systĂšme de dĂ©tections dâerreurs dans plusieurs cadres applicatifs.My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications
Ătude sur les reprĂ©sentations continues de mots appliquĂ©es Ă la dĂ©tection automatique des erreurs de reconnaissance de la parole
My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur lâutilisation dâune approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. Lâexploitation des embeddings repose sur lâidĂ©e que la dĂ©tection dâerreurs consiste Ă trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. LâintĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. Dâabord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable dâintĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte dâune part sur lâĂ©valuation de diffĂ©rents types dâembeddings linguistiques puis sur leurs combinaisons. Dâautre part, elle sâintĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur lâanalyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que lâinformation fournie par notre systĂšme de dĂ©tections dâerreurs dans plusieurs cadres applicatifs
- âŠ